Mining Association Algorithm with Threshold based on ROC Analysis

نویسندگان

  • Minoru Kawahara
  • Hiroyuki Kawano
چکیده

The mining association algorithm is one of the most important data mining algorithms to derive association rules at high speed from huge databases. However, the algorithm tends to derive those rules that contain noises such as stopwords then some systems remove the noises using noise filters. We have been improving the algorithm and developing navigation systems for semi-structured data using the algorithm, and we also use a dictionary to remove noises from derived association rules. In order to derive effective rules, it is very important how to determine system parameters such as threshold values of the minimum support and the minimum confidence. Then we have adapted the ROC analysis to the algorithm on our navigation systems and evaluated the performance of derived rules. In this paper, we import the parameters from the ROC analysis into the algorithm to propose extended mining association algorithms. Moreover, we evaluate the performance of our proposed algorithms using a experimental database and show how our proposed algorithms can derive effective association rules. We also show that our proposed algorithms can remove stopwords automatically from raw data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Association Algorithm with Improved Threshold Based on ROC Analysis

The mining association algorithm is one of the most popular data mining algorithms to derive association rules at high speed from huge databases. We have been developing navigation systems for semi-structured data like as Web data and bibliographic data. To navigate beginners, our systems give the association rules derived by the algorithm. However; the algorithm tends to derive those rules tha...

متن کامل

A new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining

Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...

متن کامل

Introducing an algorithm for use to hide sensitive association rules through perturb technique

Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...

متن کامل

Investigating the Effect of Land Use and Soil’s Physio-chemical properties on Wind Erosion Threshold Velocities via Data Mining

Introduction: Wind erosion is a phenomenon that causes severe environmental changes in arid and semi-arid climates. As surface soil texture is very effective in soil erodibility, identifying soil erodibility index is important and efficient. Mismanagement greatly contributes to the development of wind erosion. The velocity that makes the first particles of soil move from the surface is called t...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001